copent: Estimating Copula Entropy and Transfer
Entropy in R
MA Jian, PhD
Tsinghua University
majian03@gmail.com
useR! 2021
1 / 27
Contents
1
Introduction
2
Theory and Estimation
Copula Theory
Theory of Copula Entropy
Estimating Copula Entropy and Transfer Entropy
3
Implementation
Overview
Functions
4
Examples
Variable Selection
Causal Discovery
5
Summary
2 / 27
Introduction
Introduction
Statistical independence and conditional independence are two fundamental
conceptes in statistics and machine learning with many applications in
different areas.
Copula Entropy (CE) is a mathematical concept for multivariate statistical
independence testing with several good properties, and can be estimated
nonparametrically.
Transfer Entropy (TE), a tool for measuring causality, can be representation
with only CE, and therefore can also be estimated nonparametrically via CE.
The copent package in R implements the above methods for estimating CE
and TE.
This talk introduces the implementation of the package and compares it with
the other related methods implemented in R on (conditional) independence
testing with two real-world data examples.
3 / 27
Theory and Estimation Copula Theory
Copula Theory
Theorem (Sklar’s Theorem)
a
Given a random vector X = (X
1
, . . . , X
N
), its PDF p(x) can be represented as
p(x) = c(u)
N
Y
i=1
p
i
(x
i
), (1)
where u = {u
i
} are marginal distribution functions of X, {p
i
, i = 1, . . . , N} are
marginal density functions of X, and c is copula density.
a
M. Sklar. “Fonctions de repartition an dimensions et leurs marges”. In: Publ. Inst. Statist. Univ. Paris 8 (1959), pp. 229–231.
the core of copula theory
seperating dependence representation from properties of individual variables
4 / 27
Theory and Estimation Theory of Copula Entropy
Definition and Theorem
Definition (Copula Entropy)
Let X be random variables with marginals u and copula density c(u). CE of X is
defined as
H
c
(x) =
Z
u
c(u) log c(u)du. (2)
Theorem
Mutual Information of X is equivalent to its negative CE.
I (x) = H
c
(x). (3)
the theory of statistical independence measure
the bridge between copula theory and information theory
1
1
Jian Ma and Zengqi Sun. “Mutual information is copula entropy”. In: Tsinghua Science & Technology 16.1 (2011). See also arXiv preprint
arXiv:0808.0845 (2008), pp. 51–54.
5 / 27
Theory and Estimation Theory of Copula Entropy
Properties and Comparison
Axiomatic properties of CE
multivariate
symmetric
non-negative, 0 iff independence
invariant to monotonic transformation
equivalent to correlation coefficient in Gaussian cases
An ideal measure compared with others
Table: Comparison with other independence measures.
CE Distance Correlation HSIC
Definition copula based generalised corr corr in RKHS
Multivariate Yes distance multivariance dHSIC
Invariance monotonic trans No No
Gaussanity equivalent to cc unclear unclear
Computation low high high
6 / 27
Theory and Estimation Estimating Copula Entropy and Transfer Entropy
Estimating CE
Non-Parametric Estimation Method
2
1
estimating empirical copula density with rank statistic
2
estimating CE with the KSG entropy estimation method
Advantages
distribution-free, non-parametric
tuning-free, insensitive to parameters
good convergence
easy to implement
low computation burden
2
Jian Ma and Zengqi Sun. “Mutual information is copula entropy”. In: Tsinghua Science & Technology 16.1 (2011). See also arXiv preprint
arXiv:0808.0845 (2008), pp. 51–54.
7 / 27
Theory and Estimation Estimating Copula Entropy and Transfer Entropy
Estimating Transfer Entropy via CE
Definition (Transfer Entropy)
a
Let x
t
, y
t
be two time series observations at time t = 1, . . . , N of the processes
X
t
, Y
t
. TE T
Y X
from Y to X is defined as
T
Y X
=
X
p(x
t+1
, x
t
, y
t
) log
p(x
t+1
|x
t
, y
t
)
p(x
t+1
|x
t
))
. (4)
a
Thomas Schreiber. “Measuring information transfer”. In: Physical Review Letters 85.2 (2000), p. 461.
1
CE representation of TE
Ma
3
proved that TE can be represented with only CE as follows:
T
Y X
= H
c
(x
t+1
, x
t
, y
t
) + H
c
(x
t+1
, x
t
) + H
c
(y
t
, x
t
). (5)
2
Nonparametric estimation of TE via CE
1
Estimating three CE terms in (5);
2
Calculating TE with the estimated CE terms.
3
Jian Ma. “Estimating Transfer Entropy via Copula Entropy”. In: arXiv preprint arXiv:1910.04375 (2019).
8 / 27
Implementation Overview
The copent Package: Overview
The copent package for estimating CE was developed during the author’s PhD
study at Tsinghua University, and first released on the CRAN on April 16, 2020.
Currently, it implements the methods for estimating CE and TE.
latest version: 0.2
including 5 functions
Table: The functions in the package.
Function Description
construct empirical copula(x)
constructing empirical copula function from data x
based on rank statistic
entknn(x,k,dt)
estimating entropy from data x with the KSG method
copent(x,k,dt)
main function for estimating CE by calling the above
two functions
ci(x,y,z,k,dt)
testing conditional independence between (x,y)
conditioned on z
transent(x,y,lag,k,dt)
estimating TE from y to x with time lag lag
Note: k,dt are the arguments for k
th
nearest neighbour and distance type of the KSG algo-
rithm respectively.
9 / 27
Implementation Functions
Functions for Estimating CE
1
construct empirical copula
This function estimates copula density from data with rank statistic.
1 c o n s t r u c t e m p i r i c a l co p ul a ( x )
2
entknn
This function implements the KSG method for estimating entropy.
1 ent knn ( x , k=3, d t=2)
3
copent
main function which implements the nonparametric method for estimating
CE. It returns negative CE for convenience.
1 c ope nt<f u n c t i o n ( x , k=3, d t =2){
2 xc = c o n s t r u c t e m p i r i c a l co pu l a ( x )
3 en tknn ( xc , k , dt )
4 }
10 / 27
Implementation Functions
Function for Conditional Independence Testing
1
ci
This function implements the method for testing conditional independence
between (x,y) conditioned on z by calling the function copent three times
according to (5).
1 c i<f u n c t i o n ( x , y , z , k=3, dt =2){
2 xy z = cb i nd ( x , y , z )
3 x z = c b i n d ( x , z )
4 y z = c b i n d ( y , z )
5 c o pe nt ( xyz , k , dt ) cop en t ( xz , k , dt ) c op en t ( yz , k , dt )
6 }
11 / 27
Implementation Functions
Function for Estimating TE
1
transent
This function implements the method for estimating TE from y to x with
time lag lag by simply calling the function ci after preparing the data
according to lag.
1 t r a n s e n t<f u n c t i o n ( x , y , l a g =1,k=3, dt =2){
2 l = l e n g t h ( x )
3 x1 = x [ 1 : ( l l a g ) ]
4 x2 = x [ ( l a g +1) : l ]
5 y1 = y [ 1 : ( l l a g ) ]
6 c i ( x2 , y1 , x1 , k , dt )
7 }
12 / 27
Examples Variable Selection
Example I
Variable Selection
45
4
Jian Ma. “Variable Selection with Copula Entropy”. In: Chinese Journal of Applied Probability and Statistics (accepted). See also arXiv preprint
arXiv:1910.12389 (2019).
5
The code for this example is available at https://github.com/majianthu/aps2020.
13 / 27
Examples Variable Selection
Variable Selection with CE
CE based method
To select variables based on ranks of their negative CE values with target
Other related measures in R
Hilbert-Schmidt Independence Criterion (HSIC): dHSIC
Distance Correlation: energy
Heller-Heller-Gorfine Tests of Independence: HHG
Hoeffing’s D Test: independence
Bergsma-Dassios T* sign covariance: independence
Ball Correlation: Ball
1 l i b r a r y ( cop en t ) # Copu la En tro py
2 l i b r a r y ( en e rg y ) # Di s t a nc e C o r r e l a t i o n
3 l i b r a r y ( dHSIC ) # H i l b e r t Schmidt I n de pe nd en c e C r i t e r i o n
4 l i b r a r y (HHG) # H e l l e r H e l l e r G o r f i n e T es t s o f I nd ep en de n ce
5 l i b r a r y ( in d ep en d en c e ) # Ho ef f di ng s D t e s t o r Bergsma
D a s si o s T s i g n c o v a r i a n c e
6 l i b r a r y ( B a l l ) # B a l l c o r r e l a t i o n
14 / 27
Examples Variable Selection
UCI Heart Disease Data
The data set contains 4 databases collected from four different locations
worldwide, including 899 samples without missing values. Each sample has 76
attributes concerning heart disease diagnosis (#58 for diagnosis), 13 attributes of
which were recommended by professionals as clinical relevant.
1 s c an da ta <f u n c t i o n ( f il e na m e 1 , n l = 0) {
2 u r l 1 = p a s te ( h t tp : / / a r c h i v e . i c s . u c i . edu /ml/ machine
l e a r n i n g d a ta b a s e s / h ea rt d i s e a s e /” , f i le n am e 1 , s ep=” )
3 da ta1 = sc an ( ur l 1 , n l i n e s = n l , what = c ( as . l i s t ( r ep
( 0 , 7 5) ) , l i s t ( ) ) )
4 l = l e n g t h ( dat a1 [ [ 1 ] ] )
5 data1m = m a tr i x ( u n l i s t ( dat a1 ) , l , 76 )
6 m a t ri x ( as . n ume ric ( data1m [ , 1 : 7 5 ] ) , l , 7 5)
7 }
8 h1 = sc an da ta ( c l e v e l a n d . d ata , 282 10)
9 h2 = sc an da ta ( h u ng a r i a n . d at a )
10 h3 = sc an da ta ( s w i t z e r l a n d . d ata )
11 h4 = sc an da ta ( lon gbeachva . d ata )
12 he a r t 1 = as . m at r i x ( r b i n d ( h1 , h2 , h3 , h4 ) )
15 / 27
Examples Variable Selection
Code of the Example
The dependences between #58 attribute (diagnosis) and the other attributes are
estimated with the 6 measures as follows:
1 f o r ( i i n 1 : 7 6 ) {
2 c e58 [ i ] = co pe nt ( h ea r t 1 [ , c ( i , 5 8 ) ] )
3 dc or 5 8 [ i ] = d co r ( h e a rt 1 [ , i ] , h e a r t 1 [ , 5 8 ] )
4 d h s i c 5 8 [ i ] = d h s i c ( h e a rt 1 [ , i ] , h e a r t 1 [ , 5 8 ] ) $dHSIC
5 Dx = as . m a tr i x ( d i s t ( ( he a r t 1 [ , i ] ) , d i a g=TRUE, up per=TRUE) )
6 Dy = as . m a tr i x ( d i s t ( ( he a r t 1 [ , 5 8 ] ) , di a g=TRUE, up per=TRUE) )
7 hhg58 [ i ] = hhg . t e s t ( Dx , Dy , nr . perm = 500 )
8 i nd 58 [ i ] = h o e f f d i n g . D. t e s t ( h e a r t 1 [ , i ] , h ea r t 1 [ , 5 8 ] ) $Dn
9 b a l l 5 8 [ i ] = bc or ( h e ar t 1 [ , i ] , h e a r t 1 [ , 5 8 ] )
10 }
16 / 27
Examples Variable Selection
Selection Results
The figures below show the dependence between #58(diagnosis) and the other
attributes. The red lines are all the dependence between #58 and #16, which are
taken as the selection threshold for each measure.
−0.10 0.00 0.05 0.10 0.15
Variable
Copula Entropy
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71
(a) CE
● ●
0.0 0.1 0.2 0.3 0.4
Variable
dCor
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71
(b) dCor
● ●
0.000 0.010 0.020
Variable
dHSIC
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71
(c) dHSIC
● ●
0e+00 1e+07 2e+07 3e+07 4e+07
Variable
HHG
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71
(d) HHG
● ●
0.000 0.010 0.020 0.030
Variable
Hoeffding
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71
(e) Hoeffding’s D
● ●
0.00 0.02 0.04 0.06 0.08
Variable
Ball
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71
(f) Ball Correlation
Figure: Variables selected with the 6 measures.
17 / 27
Examples Variable Selection
Interpretability of the Selections
Number of the selected recommended variables
CE selects more recommended variables with biomedical meanings than the other
measures do.
Table: Selected variables with the 6 measures.
Measure Selected Variables’ ID X
CE 3,4,6,7,9,12,16,28-32,38,40,41,44,51,59-68 11
dCor 3,4,6,7,9,12,13,16,28-33,38,40,41,52,59-68 9
dHSIC 3,4,6,7,9,12,13,16,25,29-32,38,40,41,44,59-68 10
HHG 4,6,7,9,16,25,32,38,40,41,52 7
Hoeffding’s D 4,5,8,9,13,16,17,23,26,27,38,39,45-50,52-54 4
Ball 4,6,7,9,13,16,25,32,38,40,41,52 7
Recommendations 3,4,9,10,12,16,19,32,38,40,41,44,51 13
18 / 27
Examples Causal Discovery
Example II
Causal Discovery
67
6
Jian Ma. “Estimating Transfer Entropy via Copula Entropy”. In: arXiv preprint arXiv:1910.04375 (2019).
7
The code for this example is available at https://github.com/majianthu/transferentropy.
19 / 27
Examples Causal Discovery
Causal Discovery
Goal
To infer causality from time series data by estimating TE
Other Related Methods in R
Kernel-based Conditional Independence (KCI): CondIndTests
Conditional Distance Correlation (CDC): cdcsis
COnditional DEpendence Coefficient (CODEC): FOCI
1 l i b r a r y ( cop en t )
2 l i b r a r y ( Co ndI ndT est s )
3 l i b r a r y ( c d c s i s )
4 l i b r a r y ( FOCI )
20 / 27
Examples Causal Discovery
UCI Beijing PM2.5 Data
Overview
Time & Location
hourly data from 2010-01-01 to 2014-12-31, including PM2.5 data of US Embassy in
Beijing and meteorological data from Beijing Capital International Airport
Meteorological factors
dew point, temperature, pressure, cumulated wind speed, combined wind direction,
cumulated hours of snow, cumulated hours of rain.
Experimental data
the ’pressure’ factor used in the example;
501 samples without missing values (2010-04-022010-04-23).
1 u c i d a t a = r e ad . c sv ( ” ht t p s : // a r c h i v e . i c s . u c i . edu/ml/ machine
l e a r n i n g d at a b a s es / 00381 /PRSA da ta
2 01 0 .1 . 1 20 1 4. 1 2 . 3 1 . c sv )
2 da ta = u c i d a t a [ 2 2 00 : 2 7 00 , c ( 6 , 9 ) ] # 6(PM2 . 5 ) , 9( P r e s s u r e )
21 / 27
Examples Causal Discovery
Code of the Example
The causality from pressure to PM2.5 with time lag from 1h to 24h is estimated
with the 4 measures as follows:
1 f o r ( l a g i n 1 : 2 4 ) {
2 pm25a = da ta [1:( 501 l a g ) , 1 ]
3 pm25b = da ta [ ( l a g +1) : 5 0 1 , 1 ]
4 v1 = dat a [1: (501 l a g ) , 2 ]
5
6 t e1 [ l a g ] = t r a n s e n t ( da ta [ , 1 ] , d ata [ , 2 ] , l a g )
7 # t e 1 [ l a g ] = c i ( pm25b , v1 , pm25a )
8
9 k c i 1 [ l a g ] = KCI ( pm25b , v1 , pm25a ) $ t e s t S t a t i s t i c
10 cdc 1 [ l a g ] = c d co r (pm25b , v1 , pm25a )
11 cod ec1 [ l a g ] = cod ec ( pm25b , v1 , pm25a )
12 }
22 / 27
Examples Causal Discovery
Results
5 10 15 20
−0.1 0.0 0.1 0.2 0.3
lag (hours)
Transfer Entropy
(a) TE via CE
5 10 15 20
0.15 0.20 0.25 0.30
lag (hours)
CDC
(b) CDC
5 10 15 20
0 500 1000 1500
lag (hours)
KCI
(c) KCI
5 10 15 20
−0.2 −0.1 0.0 0.1 0.2 0.3
lag (hours)
CODEC
(d) CODEC
Figure: Estimated causality from pressure to PM2.5 with lags from 1h to 24h.
23 / 27
Summary
Summary
The theory of CE and the estimation methods of CE and TE are introduced.
copent, the R package for estimating TE and CE, is introduced with
implementation details.
The examples on variable selection and causal discovery demonstrate the
usage of the copent package and compare it with the related R packages.
24 / 27
Summary
References
1
Jian Ma and Zengqi Sun. “Mutual information is copula entropy”. In: Tsinghua
Science & Technology 16.1 (2011). See also arXiv preprint arXiv:0808.0845 (2008),
pp. 51–54
2
Jian Ma. “Variable Selection with Copula Entropy”. In: Chinese Journal of Applied
Probability and Statistics (accepted). See also arXiv preprint arXiv:1910.12389
(2019)
3
Jian Ma. “Estimating Transfer Entropy via Copula Entropy”. In: arXiv preprint
arXiv:1910.04375 (2019)
4
Jian Ma. “copent: Estimating Copula Entropy and Transfer Entropy in R”. In:
arXiv preprint arXiv:2005.14025 (2020)
http://arxiv.org/a/ma j 3
25 / 27
Summary
Softwares
https://cran.r-project.org/package=copent
https://pypi.org/project/copent
https://github.com/majianthu
The package copent
8
in R and Python for estimating copula entropy and transfer entropy
are available on CRAN and PyPI respectively. The source codes are provided on GitHub.
8
Jian Ma. “copent: Estimating Copula Entropy and Transfer Entropy in R”. In: arXiv preprint arXiv:2005.14025 (2020).
26 / 27
Summary
Enjoy the Power of Copula Entropy!
27 / 27